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Algorithms for Lattice QCD with Dynamical Fermions 

A. D. Kennedy'' 

'^School of Physics, University of Edinburgh, King's Buildings, Edinburgh, EH9 3JZ, United Kingdom 

We consider recent progress in algorithms for generating gauge field configurations that include the dynami- 
cal effects of light fermions. We survey what has been achieved in recent state-of-the-art computations, and 
examine the trade-offs between performance and control of systematic errors. We briefly review the use of 
polynomial and rational approximations in Hybrid Monte Carlo algorithms, and some of the theory of on-shell 
chiral fermions on the lattice. This provides a theoretical framework within which we compare algorithmic 
alternatives for their implementation; and again we examine the trade-offs between speed and error control. 



1. Introduction 

The aim of this review is to provide a snapshot 
of the present status of dynamical fermion simula- 
tions, both as regards their performance and the 
possible sources of systematic errors that are not 
fully under control, as well as an introduction to 
some of the algorithmic ideas that are currently 
being investigated. As the goal is to be pedagog- 
ical rather than exhaustive we do not attempt to 
summarise all of the algorithmic talks that were 
presented at this conference. 

In ^ we consider the present state of large- 
scale dynamical fermion computations, compar- 
ing their costs and their control of systematic er- 
rors. This leads us to consider the issue of the 
locality of using fractional powers of the fermion 
determinant in ^ In f^l^e consider various al- 
gorithms for introducing "fat links" in a differ- 
entiable way. In fjSl we review a new algorithm 
which allows a domain-decomposition approach 
to be used for lattice field theories. In fjHlwe out- 
line the theory behind some of the algorithms that 
are being used to include fractional powers of the 
fermion determinant; the same techniques may be 
applied to evade instabilities in numerical inte- 
grators (0, as can some other clever tricks (fJHl)- 
Finally, we consider in some detail algorithms for 
including the dynamical effects of on-shell chiral 
fermions in ^ and in particular emphasise that 
all the methods used may be considered to be dif- 
ferent approximations to the Neuberger operator. 



2. Numerical Simulations under 
"Battlefield Conditions" 

Before comparing the performance and efficacy 
of the various dynamical fermion algorithms cur- 
rently being used in large-scale numerical compu- 
tations there some caveats that must be empha- 
sised. 

Since such computations are extremely expen- 
sive they have only been carried out at a small set 
of parameter values, and thus we cannot reliably 
compare their results or behaviour in either the 
continuum or thermodynamic (infinite volume) 
hmit. Indeed, there are usually only results for 
two or fewer different lattice spacings. 

For the same reason it is not possible to com- 
pare the performance of the algorithms at the 
same physical quark masses; for this we would 
need to interpolate between runs with different 
dynamical (not valence) quark masses. 

Some computations have been carried out with 
two quark flavours and others with three, and 
again we are forced to ignore these differences due 
to lack of data. Since the cost of adding in a third 
heavy dynamical quark is usually small compared 
to the cost of the two lightest quarks this prob- 
ably is not too important within the large error 
bars. 

It is very hard to make reliable estimates of the 
autocorrelation times involved, and thus to esti- 
mate the cost of generating statistically indepen- 
dent configurations from an equilibrium distribu- 
tion. Not only do results depends on the operator 
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whose integrated autocorrelations are measured, 
but also on the physics of the system, such as the 
presence of nearby phase transitions. 

Computations have been carried out on differ- 
ent lattice sizes, so we obviously need to com- 
pare the "cost per unit volume." ft would be 
wrong to assume naively that the cost of all al- 
gorithms scale linearly with the number of lat- 
tice sites Vjaf^ at fixed lattice spacing a: while 
the R algorithm f ^6.1|l cost scales as V for fixed 
(volume-independent) 0((5t^) errors in the pa- 
rameters of the effective action being simulated, 
the cost of the Hybrid Monte Carlo (HMC) al- 
gorithm^ grows as V'^^^ . Fortunately the effects 
of such different volume extrapolations are small 
compared to the errors in the cost estimates any- 
how. 

Figure n shows the approximate cost of evolv- 
ing a 16"^ X 32 lattice through one unit of MD 
time as a function of the dynamical tt mass 
achieved. Four types of fermion action have been 
used recently for large-scale computations. Wil- 
son/Glover fermions seem to increase in cost as 
the vr mass is reduced much more rapidly than 
the other formulations; indeed, there are runs 
at smaller tt mass which are so expensive that 
they lie significantly above the graph. It may be 
conjectured that this is because the Dirac spec- 
trum not bounded below, which is not the case for 
KS/Staggered fermions (where recent data from 
the MILC collaboration use the ASQTAD improved 
action), twisted mass (QCD^") fermions, or do- 
main wall (GW/Overlap) fermions. The question 
of which formulation on-shell chiral fermions is 
best, or indeed whether such formulations arc to 
be preferred at all, depends on how much chi- 
ral symmetry is required. At present dynami- 
cal GW fermions seem to be about 10-100 times 
more expensive than the corresponding ASQTAD 
or QCD^" runs at comparable masses. 

The figure also indicates subjective extrapola- 



^This is because it is necessary to scale the molecular dy- 
namics (MD) step size 5t oc V^^"^ in free field HMC for a 
constant acceptance rate. We shall see later (ijTJ that for 
HMC computations involving light fermions the step size 
5t is constrained by instability in the integrator rather 
than by the bulk behaviour of the HMC Hamiltonian, so 
a V dependence might be more appropriate for HMC too. 
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Figure 1. Comparison of cost of various algo- 
rithms. The Wilson fermion data is taken from 
Pg, the ASQTAD data from [HE], the QCD^" 
data from |S| , and the domain wall data from . 
The scatter of the points gives a rough indication 
of the uncertainties in the cost estimates. For 
the QCD^" fermions there are number of points 
at the same value for the tt mass, because the 
system shows a first-order phase transition with 
different costs in each phase |7|. The bands are a 
subjective extrapolation of the costs. 



tions for the costs. In particular it seems rea- 
sonable that the cost for dynamical domain wall 
fermions, or any other formulation exhibiting ex- 
act on-shell lattice chiral symmetry, should even- 
tually become cheaper for light enough quarks be- 
cause the chiral limit for them does not require 
taking the continuum limit. ^ 

3. Locality 

There has been much debate recently about 
whether dynamical fermion formulations that 
weight configurations with a non-integral power 
of the fermion determinant correspond to local 
quantum field theories (QFTs) or not, and how 
significant locality is. For example, staggered 
quarks appear in multiples of four "tastes;" when 
the staggered fermion fields are integrated over we 
are left with a fermion determinant det M in the 

^Of course we need to be close enough to the continuum 
limit in all cases to extract real-world physics. 
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functional integral. This is replaced by (det A^)^^^ 
to obtain a corresponding two taste theory, and 
gauge configurations may be generated with this 
weight using inexact (R) fi lti.f |) or exact (PHMC 
or RHMC) algorithms (301 • 

If a QFT is local then we are guaranteed that 
it has the cluster decomposition property, and 
that within the context of renormalized perturba- 
tion theory it satisfies the familiar power counting 
rules, exhibits universality, and is amenable to a 
systematic improvement procedure. On the other 
hand, if it is not local then there is little we can 
say about these properties other than that we we 
have no a priori reason to expect them to hold. 
In particular, if we have a non-local lattice theory 
then we have no good reason to invoke power- 
counting arguments to justify taking the nai've 
continuum limit, or to expect the lattice theory 
to be described by continuum perturbation the- 
ory however small the lattice spacing. 

The fact that a formulation is not manifestly 
local does not logically imply that it is not local. 
After all, it is easy to transform a manifestly lo- 
cal theory into an equivalent but non-manifestly- 
local form: we use this freedom to replace 
fermion fields with a non-manifestly-local fermion 
determinant, or an equally non-manifestly- local 
pseudofermion representation. For the case of two 
tastes of staggered fermions there may be a local 
fermion kernel A4' such that detAi' = V det A4 , 
However, in general a non-manifestly local theory 
has no reason to be equivalent to a local one. 

Even if there was a local action correspond- 
ing to taking fractional powers of the fermion 
determinant in the functional integral, we are 
still required to use this local action to measure 
fermionic quantities. For the case of staggered 
quarks this means one must use the hypotheti- 
cal A4' rather than A4 for valence measurements. 
We should not expect that measuring operators 
corresponding to a local four taste valence action 
Ai on configurations generated with \/det A4 to 
lead to consistent results. Not only might there 
be unknown renormalisations of the parameters 
between the sea and valence actions (e.g., what 
is the justification for using the same numerical 
value for the quark masses?) but the degrees of 
freedom are not even the same. Unless we know 



A4' explicitly or are only interested in measur- 
ing purely gluonic observables we are forced into 
having a "mixed action," that is different sea and 
valence actions, and such a situation suffers from 
the same problems, such as violation of unitar- 
ity, as quenched QCD. Of course, one might hope 
that these problems do not occur, or at least are 
smaller than for an arbitrary mixed action situa- 
tion, but these systematic errors are not quanti- 
tatively under control. 

3.1. Is VM Local? 

Some recent publications 8 9^ have studied the 
question as to whether the operator \/JA is in 
fact local. To make this question meaningful we 
need to specify which square root is under con- 
sideration. If the square root of the lattice op- 
erator is defined by taking the square roots of 
its eigenvalues in a basis in which it is diagonal 
then the sign of these roots is arbitrary. In the 
tests that have been carried out the square root 
has been defined by a polynomial approximation, 
which corresponds to choosing the positive square 
roots. Suppose we applied this procedure to the 

Wilson Dirac operator '\Jt>wt>w = V (ts^vk)^; 
we would find that the square root was non-local, 
because the local square root (750^y) is not a 
positive operator. 

It is important to stress that the fact that VaT 
appears to be non-local does not prove that there 
is no corresponding local operator A4' with the 
same determinant; on the other hand the only op- 
erator we know with this determinant (and with 
staggered lattice symmetry) is VJA itself, so it is 
worthwhile verifying that it is not (miraculously) 
local as has sometimes been suggested as a possi- 
bility. After all, the Neuberger operator appears 
to be local on sufficiently smooth gauge configura- 
tions PO], despite the fact that it is not manifestly 
local. Hoping for the existence of Ai', or even 
proving it, is insufficient: we would need to use 
this operator for valence measurements in order 
to have a manifestly consistent unitary theory. 

Let us now briefly review how the locality of 
a lattice Dirac operator may be measured nu- 
merically. We define a "wavefunction" by ap- 
plying a lattice Dirac operator to a point source, 
V'(a;) = a0{x,y)5{y). We then say that the cor- 
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responding QFT is ultralocal if the wavefunction 
has "compact support," = OV||a; — > r, 

and is local if the wavefunction has "fast de- 
crease," niax||2._j^||=j, < Ce~''/'''°'', with ri„^ 
being the localisation length. Note that in both 
cases the wavefunction falls to zero at any non- 
zero physical distance in the continuum limit, 
thus corresponding to a local continuum theory. 
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Figure 2. Effective localisation length rioo in phys- 
ical units as a function of physical distance for the 
2-taste ASQTAD action, for a variety of different 
lattice spacings [HI- 



Typical results for the effective localisation 
length a {ro\n[ip{r)/ip{r + a)]} ^ are shown in 
Figure HI as a function of the physical distance 
(ro is the Sommer scale). The effective localisa- 
tion length does not reach a plateau in the region 
where we may hope that finite-size effects are un- 
der control, so we cannot extract a definite esti- 
mate of Tioc, on the other hand the evidence is 
that it must be non-zero at a non-zero physical 
distance, independent of a, so the operator cannot 
be local in the continuum limit. Furthermore, the 
magnitude of the localisation length is compara- 
ble with the inverse n mass, and thus use of VaT 



would not lead to meaningful (valence) spectrum 
measurements. 

The reader is also referred to interesting related 
work in the Schwinger model 

4. Fat Links 

We shall now survey the topic of "fat links." 
The idea of replacing gauge link variables by some 
average of their neighbours is not new, and for a 
long time has been used to construct operators 
that are good sources and sinks, in the sense that 
they have a better overlap with ground state. The 
basic idea is that instead of explicitly construct- 
ing a more complicated less local improved oper- 
ator we use the simplest operator with the cor- 
rect symmetries (quantum numbers) but taken 
as a function of the smeared or fattened link 
fields. The fattening procedure is thus equiva- 
lent to building a physical size smeared source. 
Of course, to calculate masses and so forth we 
need to measure temporal correlation functions, 
and the sources and sinks for these should be lo- 
calised in time, so we typically want to use spatial 
link fattening only in this case. 

Another use of fat links which is becoming im- 
portant as dynamical fermion computations be- 
come commonplace is that of smearing the ac- 
tion itself, in this case by computing the orig- 
inal action as a function of the fattened link 
fields. Here the goal is to suppress UV fluctua- 
tions in the gauge fields. This may be considered 
as a means of constructing improved actions in 
the spirit of renormalisation-group improved (or 
"perfect") actions, or systematic improvement in 
powers of a (either perturbative or not). 

There have recently been two main applications 
of fat links: improving gauge actions so as to pro- 
duce smoother gauge fields upon which quenched 
GW Dirac operators are localised, and improving 
fermionic actions themselves. We shall consider 
the latter here, as we are concerned with dynam- 
ical fermion algorithms. The problem with us- 
ing a highly improved action such as HYP or 
perfect actions ^2] in dynamical computations is 
that we either need to find non-MD-based algo- 
rithms for generating the gauge configurations in- 
cluding the effect of the fermion determinant, or 
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we need to compute the MD force corresponding 
to the improved actions. 

4.1. Molecular Dynamics with Fat Links 

The most successful fat actions, such as HYP, 
used in quenched computations have been those 
where the fat Unks are somehow projected onto 
the group manifold. Indeed, this seems to be 
a crucial ingredient for their success. Unfortu- 
nately this projection would seem to be a non- 
differentiable operation (i.e., the fat links are not 
a smooth function of the original thin links). The 
first attempt to show that this was not necessarily 
so, and indeed that the MD equations of motion 
for fat actions could be practicable, was using the 
FLIC (Fat Link Improved Clover) action [H] . 

Let V be the usual unprojected APE-smeared 
link, then the usual iterative projection V ^ U 
where U G SU(3) maximizes RetiUV^ is not 
differentia ble; h owever by defining successively 
W = V/VVVf and U = W/\/detW instead we 
obtain a mapping that is almost differentiable. 
The "almost" is that differentiability fails along 
the branch cut of the cube root, but this does not 
seem too important in practice. 



links seem to be about as good as ordinary pro- 
jected fat links, but require a little more tuning. 

These methods can be applied iteratively to 
produce differentiable links of arbitrary obesity. 

5. Schwarz Alternating Procedure 

An interesting new algorithm based on the 
Schwarz Alternating Procedure (SAP) has been 
introduced by Liischer both as a preconditioner 
for solving the lattice Dirac equation jj^l and 
as dynamical fermion Monte Carlo algorithm |17l 
HH]. The SAP, introduced in 1870, was probably 
the first domain decomposition method for solv- 
ing the Dirichlet problem for elliptic partial dif- 
ferential equations in complicated domains. The 
inequalities used in the convergence proof of the 
SAP for PDEs rely on the elliptic nature of the 
systems. 
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Figure 4. SAP domain decomposition. 



Figure 3. Definition of stout links. 

Another "projection" procedure which is truly 
differentiable is that of "stout links" |15|. Let 
y be a suitably smeared link and Uq the corre- 
sponding thin link, so that VUq is an untraced 
sum of plaquettes. This may be "projected" onto 
SU(3) by first projecting it onto the su(3) alge- 
bra and then exponentiating it into the group, 
U — exp T{Vuf), where T means the traceless 
antihermitian part. This is now differentiable, 
but it does not look too much like projection (ex- 
cept when VUq « 1). In quenched tests stout 



The basic idea is to decompose the lattice into 
blocks, and then update each block with bound- 
ary conditions specified by the most recent fields 
on the neighbouring blocks, as illustrated in Fig- 
ureEl In the figure 6^ blocks are shown, but when 
considering the scaling behaviour of the algorithm 
the blocks should be taken to be of fixed physical 
size. 

5.1. SAP Preconditioner for Linear Solver 

When we invert the Dirac operator on the lat- 
tice we modify the (pseudo)fermion fields in a 
fixed gauge background. For a nearest-neighbour 
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fermion kernel, such as the Wilson Dirac opera- 
tor, the fermion fields in the even blocks are cou- 
pled only to those in other odd blocks and vice 
versa. We can thus alternate updates of all the 
even and all the odd block fields, giving a proce- 
dure that has a large degree of parallelism. Notice 
that we are not just alternating single updates 
but we are alternating complete solutions of the 
Dirichlet problem on each set of blocks. 

This procedure converges to the correct solu- 
tion, but the rate of convergence is slow; so in- 
stead of using SAP as a solver we may use it as a 
preconditioner for a KpLiJiOB space solver (such 
as GCR or FGMRES). For this purpose accurate 
block solves are not required, as the precondi- 
tioner need only be an approximation to the in- 
verse, and only a few Schwarz cycles are required. 
This method seems to lead to a significant speed- 
up by a factor of 2-3 over a range of quark masses, 
as illustrated in Figure The algorithm paral- 
lelises easily, especially on coarse-grained archi- 
tectures such as PC clusters. 
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Figure 5. SAP preconditioner performance. The 
data is for a 48'^ x 24 lattice with a = 0.10 fm, and 
quark masses between 0.2 and 0.7 of the strange 
quark mass. The timings were obtained on 8 
nodes (16 processors) of a PC cluster with a block 
size of 6^ X 4^ and a residual of 10~^ relative to 
the source. 



5.2. Dynamical Fermion MC algorithm 

A more radical use of the SAP is as a dynamical 
fermion algorithm itself, in which we update the 
gauge fields in the blocks. 

The pseudofermion term in the action in- 
volves the inverse of the lattice Dirac opera- 
tor, and is thus manifestly non-local. In order 
to allow the update of a subset of the gauge 
links we make use of the Schur complement fac- 
torisation of the quark determinant: det0 = 
det T/)^ det pfj. det (l - ) . We 

indicate the grey and white block of Figure ^ by 
O and O* respectively, and exterior boundaries 
(indicated by white dots in the figure) by d^l and 
dn*. This decomposition is easily derived from 
the UL factorisation 
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We may now introduce two pseudofermion fields, 
(j) with support on the interior of the blocks 
and X with support on the boundary; the as- 
sociated pseudofermion part of the action being 
S,, = 0t (0^-2 + 0^-2^ ^ + X^R-^X, with R be- 
ing the Schur complement operator introduced 
above. The gauge fields on the "active" links 
within the blocks together with the cj) pseudo- 
fcrmions can be equilibrated with x held fixed, 
and then the effect of the "Schur complement" 
interaction R can be incorporated into a global 
HMC process. We occasionally shift the blocks 
to ensure all links get their turn to be active and 
thus updated. 

According to the preliminary results that have 
been presented, this scheme successfully separates 
the short- and long-distance effects. Within the 
blocks the smallest eigenvalues of the Dirac op- 
erator are regulated by the inverse block size, 
whereas for the force due to the x pseudofermions 
is small and only weakly mass dependent in the 
interior of the blocks; this allows larger step sizes 
for the global HMC process. This looks very 
promising, and it will be interesting to see how 



^In practice it is useful to use even-odd preconditioning 
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the costs scale for very small quark masses. So 
far the procedure has been applied to Wilson 
fermions, and it might be painful to generalise 
it for those fatter fermion actions for which the 
fraction of "active" links becomes smaller. 

6. Algorithms for Fractional Multiplets 

Previously in ^J^lwe discussed the validity of us- 
ing a fractional power of the fermion determinant 
in order to circumvent the doubling problem for 
staggered fermions. In this section we shall ignore 
such matters of principal and consider the avail- 
able repertoire of practical dynamical fermion al- 
gorithms available for this case. 

6.1. R Algorithm 

To date, all large-scale computations with two 
or three tastes of staggered quarks have used the 
R algorithm This is an inexact algorithm, 

in the sense that it is a Markov process that con- 
verges to an equilibrium distribution that only 
approximates the desired one. It is based on the 
alternation of MD trajectories and Gaussian mo- 
mentum refreshment, but with no accept /reject 
step to correct for non-zero step size errors. By 
using a clever combination of non-reversibility 
and area non-preservation in the MD integrator, 
the R algorithm converges to a distribution with 
errors of 0{St^) in the parameters of the action. 
As already noted in f^] these 0{6t^) errors are 
volume independent. 

It is important to note that this result is an 
asymptotic expansion in St, so there are also er- 
rors of the form e"^/""''^ which are only small if 
the quantity mdr 1 where m is some suitable 
scale parameter. Indeed, we may expect these 
non-analytic errors to become dominant at the 
same St for which the MD becomes unstable (fQ; 
we should not expect to be able to use a larger 
step size than for the corresponding exact HMC 
algorithms, except that instead of getting a tiny 
acceptance rate we would just quietly get an in- 
correct answer. The observation that the esti- 
mate of the errors in the equilibrium distribution 
is an asymptotic expansion* tells us that we have 

*C.f., perturbation theory is also only asymptotic (or 
worse), as is the "improvement" expansion in the cut-off 



no reason to expect the results for large St to just 
correspond to a renormalisation of the parame- 
ters, as was originally suggested for the related 
Langevin algorithm. 

6.2. PHMC and RHMC 

Relatively recently the Polynomial |2()I21| and 
Rational ^2 23. 24 Hybrid Monte Carlo algo- 
rithms have been developed. These permit the 
simulation of fractional powers of the fermion de- 
terminant without any concomitant systematic 
errors in the equilibrium distribution. 

The basic idea is to use a polynomial or rational 
function approximation to the power occurring 
in the action. Recall that in the context of the 
HMC algorithm an approximate action suffices 
for MD evolution, and an accurate action is only 
needed for the acceptance test. This permits us 
to use a relatively cheap approximation for each 
MD step, and a more expensive approximation 
for the infrequent acceptance tests. 

The fermion kernel A4 on the lattice is a 
large sparse matrix, and we wish to use its in- 
verse square root, or more generally some con- 
tinuous function f{M) of it. We define such 
functions on Hermitian matrices by diagonalisa- 
tion, if = UVU^^ where V is diagonal then 
f{M) = f{UVU^^) = Uf{V)U-\ The key 
property of polynomial and rational functions is 
that they do not require the diagonalisation to 
be carried out explicitly, since aAi"^ + (3A4" — 
C/(aI?" -t- I3V")U-^ and TW-i UV-^U-^. 

6.3. Me6L.imeB Approximation 

The theory of optimal L^o (HeGbinies) approx- 
imation over a compact interval is well under- 
stood: HeSLinies's theorem tells us that there is 
a unique optimal approximation characterised by 
having alternating error maxima of equal magni- 
tude (c.f., Figure|n|). In general we can determine 
the coefficients of the optimal approximation nu- 
merically using PeMca' algorithm, but for the in- 
teresting cases of sgn(x) and x^^^^ they are have 
been calculated in closed form in terms of elliptic 
functions by BojiOTapes. 

a. Indeed, we may reasonably expect scaling for more 
highly-improved theories to break down at smaller values 
of a. 



8 



sn(z/M.X) 




sn(z,k) 



Figure 6. The optimal Me6i.imeB approximation 
to the function sgna; for £ < |a;| < 1, iUustrat- 
ing the characteristic alternating extrema of the 
error. In this case the coefficients of the optimal 
rational approximation were found in closed form 
by 3ojiOTapeB in terms of Jacobi elliptic func- 
tions, as indicated by the labels on the axes. 



tion has an error that only falls as 1/n. The 
rational approximation's exponential convergence 
means that it can be made exact to machine pre- 
cision in a numerically stable and cheap manner. 
The obvious advantage of a polynomial approx- 
imation is that it does not require the numeri- 
cal solution of systems of linear equations, but 
bear in mind that the polynomial approximation 
to corresponds to computing the matrix in- 
verse using the Jacobi method, as compared with 
the more efficient KpujiOB solvers that are used 
in the computation of rational approximations. If 
the rational functions are expressed as a partial 
fraction expansion, and a multi-shift solver ,251 
is used to compute all the necessary inverses si- 
multaneously in the same KpBiJiOB space, then 
rational approximations are very competitive. 



It is not necessary to use the optimal approx- 
imation, but doing so leads to surprisingly small 
errors; for example, if we consider rational ap- 
proximations to sgn(a;) of degree (21, 20) then the 
optimal ^e6bimeB/3ojiOTapeB approximation 
has a maximum error of 4.38 x 10"'^ over 10~^ < 
I a; I < 1, whereas the Higham rational approxima- 
tion of the same order, tanh(20 tanh~^ x), clearly 
has a maximum error of about 1. 

It is worth emphasising the role played by 
HeSbniieB polynomials in the theory of opti- 
mal approximation. These polynomials are de- 
fined by Tn{x) — cos(n cos~^ x), and have ex- 
actly n+l alternating extrema of unit magnitude 
over the interval — 1 < a; < 1. This means that 
x"' — 2^~"'Tn{x) is the best ^e6L.imeB approxima- 
tion to a;" of degree n ~ 2.^ 

There are some advantages of rational over 
polynomial approximations. In most cases the er- 
ror for approximations of a given degree are much 
smaller for rational approximations; for example 
for the Loo[— 1,1] error for |a;| is proportional to 
e"/ for the best rational approximation of de- 
gree n whereas the best polynomial approxima- 

^^cSBiincB polynomials also form an ortlionormal basis 
for L^[— 1, 1] with respect to the weight + x'^, up to 

a trivial factor in the normalisation of To(x). Neverthe- 
less, truncated HcSbiinoB expansions are not optimal in 
general, even for polynomials. 



7. Instability of Symplectic Integrators 

In order to understand some of recent tech- 
niques for decreasing the cost of dynamical 
fermion HMC computations by increasing the in- 
tegration step size we need to briefly review some 
earlier results on the mechanisms which limit 
these step sizes jJHl- 

In order to satisfy detailed balance and thus 
be an exact algorithm we need to use an 
area-preserving reversible MD integration scheme 
within the HMC algorithm. Fortunately symmet- 
ric symplectic integrators, of which the familiar 
leapfrog scheme is the lowest-order example, have 
these desirable properties. They are only exactly 
reversible up to floating-point rounding errors, 
and these errors become important when they are 
exponentially amplifled by the chaotic nature of 
the underlying equations of motion |27I28| . This 
is illustrated in Figure[7| where the JlnnyHOB ex- 
ponent v that characterizes this chaotic amplifi- 
cation is plotted as a function of the integration 
step size St. Notice that i' > VSt in all three 
graphs; this corresponds to the fact the underly- 
ing continuous time equations of motion are in- 
deed chaotic. This interpretation is supported by 
the fact that the ly seems to have a constant value 
independent of St. When St exceeds some crit- 
ical value Stc then v cx St; this corresponds to 
an instability of the leapfrog integrator, a phe- 
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Figure 7. The JlanyHOB exponent v for the 
leapfrog integration shenie for quenched QCD 
(top) , QCD with heavy dynamical fermions (mid- 
dle), and QCD with light dynamical fermions 
(bottom). The exponents were measured as a 
function of the step size St by measuring the vi- 
olation of reversibility caused by rounding errors 
during a single MD trajectory. Note the differ- 
ence in the scales of the bottom graph and the 
other two. 



nomenon which occurs for the leapfrog scheme 
even for a single harmonic oscillator. Stc depends 
on quark mass: for the quenched theory or the 
case of heavy fermions the instability occurs at 
values of 5t so large that the acceptance rate 
is already essentially zero, since SH{Stc) » 1; 
with light fermions the instability sets in for much 
smaller values of 6t, and is what prevents larger 
step sizes from being used. Indeed, this is the rea- 
son why the use of higher-order symmetric sym- 
plectic integrators is not useful; they increase the 
exponent n characterising the growth of the bulk 
contribution, dH cx (5r", but yet again this is only 
an asymptotic expansion and the contributions of 
the form e^^/'^^'^ dominate when St > Stc- 



7.1. Multipseudofermions 

In order to include the dynamical effects of 
fermions we need to evaluate a functional inte- 
gral including the fermionic determinant det M . 
The usual method of doing so is to write this 
as a bosonic functional integral over a pseudo- 
fermion field with kernel M~^, using the identity 
detM cx / d<^*d(f> exp (—(I)* M^^c/)) . We then al- 
ternate Markov steps of (i) selecting a pseudo- 
fermion field from a Gaussian heatbath with the 
gauge fields U held fixed, and of (ii) updating the 
gauge fields, e.g., by using an HMC algorithm, 
while keeping the pseudofcrmion fields fixed. 

This is a perfectly valid procedure, but we are 
introducing extra noise into the system by only 
using a single pseudofcrmion field to sample the 
functional integral. This noise manifests itself as 
fiuctuations in the force exerted by the pseudo- 
fermions on the gauge fields; this increases the 
maximum fermion force, which may trigger the 
integrator instability; and this in turn requires us 
to decrease the integration step size. 

A less noisy estimate is to use the identity 
det — [det Al^^"]" to introduce n multipseudo- 
fermion fields, 

n ^ 

detM ocY\_ / d(j)*d4>j exp {-(p*M~^'"(pj) ; 

this should increase the value of Stc. 

7.1.1. Some Comments on Pseudofermions 

In the limit of an infinite number of multi- 
pseudofermions the force on the gauge fields is 
computed "exactly," but this is not the same as 
computing the force due to detA^, because the 
pseudofermions are updated at beginning of a 
gauge field trajectory, and do not represent det M 
except at the beginning of the trajectory.^ 

It is not at all clear that this is in any way a 
problem, however. Alternating gauge field trajec- 
tories with pseudofcrmion heatbath updates is a 
valid Markov chain with the correct equilibrium 
distribution. Furthermore, the MD ensures that 
the gauge fields evolve along a trajectory in the 
fixed pseudofcrmion background, and this is not 

^Except for Langevin type, or Generalised HMC 
(Kramers) algorithms. 
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the same trajectory as they would take under the 
"instantaneous" force due to detA^. There ap- 
pears to be no reason for the pseudofermion force 
along a trajectory to be larger for one sort of tra- 
jectory that for the other: they are just different 
Markov chains with the same fixed point distri- 
bution. 

7.1.2. Reduction of Maximum Force 

The first method of implementing multipseudo- 
fermions was introduced by Hasenbusch 29 . For 
the case of the Wilson fermion action A4 = 
1 — kH he introduced the quantity Al' = 1 — 
k'H, and used the identity M = M'iM'^^M) 
to write the fermion determinant as detA^ = 
det A^'det(A^'~^A^). He then used a separate 
pseudofermion for each determinant, and tuned 
k' to minimise the cost. 

This method easily generalises to more than 
two pseudofermions, and to more complicated ac- 
tions such as the Wilson-clover action. 

7.1.3. RHMC Force Reduction 

Another way of implementing multipseudo- 
fermions is to use the RHMC technique f ii6.2|l to 
implement nth root of the fermion kernel 
Following the usual procedure for RHMC we use 
a partial fraction expansion of the optimal ratio- 
nal approximation for nth root, and use a multi- 
shift solver to compute all the terms in the same 
KpBiJiOB space. 

The advantage of this approach is the no tun- 
ing is required, as the condition number k(A^) 
is divided equally among all the multipseudo- 
fermions. If we make the naive assumption that 
the cost is proportional to condition number of 
the matrices we need to invert, then the condi- 
tion number for each multipseudofermion kernel 
is K{r{Ai)) = k(A4)^/", so the total fermionic 
force is reduced by factor nK(M)'-'^/")~^. If this 
permits us to increase the step size by the same 
factor then the optimal value nopt ~ In k{M) min- 
imises the cost, with an optimal cost reduction of 
elnre/K. 

Of course, this argument is too naive, we can- 
not reduce force by an arbitrarily large factor just 
by increasing the number of multipseudofermion 
fields n. At some point the fluctuations in the 




^^3 

Figure 8. Trajectories used in the Liischer- 
Sommcr trick. 

force coming from the stochastic estimate of the 
fermion determinant are no longer larger than the 
force itself, at which point increasing n much fur- 
ther will not help. 

8. Reducing 5H Fluctuations 

8.1. New Integrator Tricks 

An interesting "replay" trick for reducing fluc- 
tuations in 5H along an HMC MD trajectory 
has been suggested by Liischer and Sommer 
Consider the situation shown in Figure |S| The 
system starts at point ([/q, ti'o) in phase space, and 
the energy change along the trajectory to (C/i, tti) 
with step size 5t is 5Hq^i. If SHq^i is small, 
say |(5i7o-ti| ^ li then we accept the new config- 
uration with probability min (l,e"*^''-i) or else 
keep the old configuration Uq; but if |(5iJo^i| > 1 
we construct a more accurate trajectory from 
{Uq^ttq) (C/2,71'2) with step size St/2. In or- 
der to ensure reversibility we need to make sure 
that this same reduced step size would be used if 
we started at (U2, —7^2), so we must verify that 
the fictitious energy change along the trajectory 
from {U2, —1^2) {U3, — Ti'a) with step size St has 
energy change \SH2-,3\ > 1. If this is the case 
we accept U2 with probability min 
or else keep Uq, and everything is area-preserving 
and reversible. If |5i?2^3| < 1 then we would not 
try reducing the step size in the reverse direction, 
so we fall back to accepting Uq. By consider- 
ing the cases where the energy changes SHq^i, 
6H2^3, and SHo^2 are negative, between and 
1, and > 1 separately detailed balance is easily 
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verified. 

This procedure is very useful if the system oc- 
casionahy hits a large energy fluctuation, for ex- 
ample if it comes across an integrator instabil- 
ity. If such instabilities occur frequently then we 
must ask whether it would not be preferable just 
to reduce the step size (say to 6t/2) for all tra- 
jectories. Another issue to consider is that often 
these large 5H changes occur as the system is ap- 
proaching equilibrium, and we will be forced to 
reduce St anyhow as the system becomes more 
equilibrated and "sees" the large force due to the 
light fermions. 

8.2. Old Integrator Tricks 

Another interesting trick that is sometimes use- 
ful was introduced by Sexton and Weingarten |H2j 
over a decade ago. Their idea is to split MD 
Hamiltonian into two parts, usually correspond- 
ing to the boson and fermion parts of the ac- 
tion, and to construct a symmetric symplectic 
integrator with larger steps for more expensive 
(fermion) part by a clever application of the 
Baker-Campbell-Hausdorff formula. This helps 
if the step size is limited by cheaper (boson) part, 
but unfortunately becomes less useful as m ^ 
when the step size is limited by the integrator 
instabilities induced by the fermion force. Nev- 
ertheless, it is a useful technique when the large 
contributions to the force are not also the most 
expensive. 

9. Dynamical Chiral Fermions 

We now turn to one of the most important de- 
velopments of recent years: the formulation of on- 
shell chirally symmetric fermions on the lattice. 
Our goal is to show that the various techniques 
developed for this purpose, such as the domain 
wall or overlap formulations, arc equivalent, in 
the sense that they arc just different approxima- 
tions to the same on-shell chiral lattice theory. 
There are many variants of the methods, but they 
all seem to correspond to choosing (explicitly or 
implicitly) a rational approximation to the sgn 
function, and of computing the inverse of the cor- 
responding approximate Neuberger operator. 

Let us fix our notation and conventions: we 



shall work in Euclidean space with Hermitian 7 
matrices; and we shall take all Dirac operators to 
be 75 Hermitian, 0^ = 75^75. 

9.1. On-shell chiral symmetry 

Although it was not the way the subject devel- 
oped historically (a matter that we will not even 
attempt to review here), an elegant logical ap- 
proach 123] is to ask if it is possible to have chi- 
ral symmetry on the lattice without doublers if 
we only insist that the symmetry holds on shell. 
Such a transformation would have to be of the 
form 



Q(l-a0/2)75 



which becomes a chiral transformation on-shell 
where = 0. Note that we are free to specify 
the chiral transformation properties of ip and -i/j 
independently. For this transformation to be a 
symmetry the Dirac operator must be invariant 

^ ga(l-a0/2)75 0ga75(l-a0/2) ^ 

For a small transformation, a <C 1, this implies 
that (1 - ia0)750 + ^75(1 - ia0) = 0, which is 
just the Ginsparg-Wilson (GW) relation 75^ -f- 
075 = a0750- 

9.2. Neuberger's Operator 

We can find a solution of the Ginsparg-Wilson 
relation as follows: let the lattice Dirac operator 
be of the form a0 = 1 + 7575, where 7g — 75, 
thus = 1-1- 7575 — 750^75. This satisfies 
the Ginsparg-Wilson relation if 7I = 1, and it 
has the correct continuum limit — s- ^ if 75 = 
75(0^— 1) -I- O(a^). Both of these conditions are 
satisfied by the choice 



75 



75(0H^-1) 



v/(0^-l)(0w-l) 



sgn[75(0v^ - 1)]. 



There are many other possible solutions, but 
these essentially all correspond to just using a 
different Dirac operator within Neuberger's op- 
erator. 

We should emphasise that we are only consid- 
ering vector-like theories with a chiral symmetry 
here, chiral theories with unpaired Weyl fermions 
can be discretised on the lattice, but getting the 
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phase of the fermion measure correct is critical. 
Simulating such theories is a much harder prob- 
lem than the already challenging computations 
required for the case considered here. 

All numerical techniques for calculating with 
on-shell chiral fermions may be viewed as differ- 
ent ways of approximating Neuberger's operator. 
It is relatively straightforward if somewhat ex- 
pensive to apply the operator, for instance by us- 
ing a rational or polynomial approximation to the 
sgn function, but it is more challenging to apply 
its inverse. The difficulty is that Neuberger's op- 
erator is constructed out of two non-commuting 
operators, 75 and 75, so the inverse of a rational 
approximation is not just the reciprocal rational 
function. 

There are various ways the inverse may be com- 
puted: the most direct approach is by a nested 
KpBiJiOB space approach, where for every outer 
iteration a full inner inversion is required to ap- 
ply the Neuberger operator. 

The obvious problems with this approach are 
(i) the information built up in the inner Kpi.iJiOB 
space is not made use of in the outer KpL.iJiOB 
space; (ii) the accuracy (residual) required for the 
inner solves in order to reach a specified accu- 
racy for the outer solve is not immediately ob- 
vious. Quite a lot of work has gone into devel- 
oping these techniques, including recent studies 
of various preconditioners and the use of "flexi- 
ble inverters" for which the approximation used 
for the preconditioner may vary during the outer 
inversion |:-i4i;-i5i:Wi:H7j . 

9.3. Into Five Dimensions 

An alternative approach to computing the in- 
verse of the Neuberger operator is to express it 
as a single inverse of a five dimensional operator. 
This may done is by linearising a continued frac- 
tion representation of a rational approximation to 
the Neuberger operator ,,38.39) by the introduc- 
tion of a set of ns auxiliary spinor fields; these can 
be introduced either as constraints or as new dy- 
namical variables. From this viewpoint the new 
"fifth" dimension is just an means of approximat- 
ing the sgn function, and the gauge fields are still 
only four dimensional. 

This is similar to the domain wall formula- 



tion and it is possible to translate domain 
wall fermions into an equivalent four-dimensional 
approximation to the Neuberger operator [411421 
I43I44| . To do this we first separate the left- 
and right-handed parts of the domain wall ker- 
nel and cyclically shift the latter by one site in 
the fifth dimension. In the domain wall formal- 
ism the four-dimensional fermion fields have their 
left-handed components on one wall and their 
right-handed components on the other wall, and 
the shift moves them both to the same wall, as 
well as reducing the kernel to a form in which we 
can identify an explicit fifth-dimensional trans- 
fer matrix T . In this form we can integrate out 
the bulk fermion fields, cancelling the Jacobian 
with the contribution from the five-dimensional 
pseudofermion fields, and construct an effective 
four dimensional Hamiltonian H = , which is 
the Euclidean form of a Cayley transform of the 
transfer matrix. This gives a Higham approxima- 
tion sgn a; tanh(rt5 tanh~^ x) to the Neuberger 
operator = i(l -|- 75 sgnH). 

Boriqi also introduced the "truncated overlap" 
for which H — 75^;^, and Chiu has shown how 
to modify the domain wall formalism to give a 
SojiOTapcB approximation in the equivalent ef- 
fective four-dimensional theory |45I46I47| . At this 
meeting Brower, Originos, & Neff 0H1 introduced 
"Mobius fermions," which interpolate between 
domain wall and truncated overlap fermions. 

9.4. Overlap Algorithms 

We see that there are many formulations of 
overlap fermions, all of which satisfy the GW rela- 
tion, and which are equivalent in continuum limit. 
The principal differences between them are that 
they use different lattice Dirac operators within 
the sgn function, and different approximations to 
the sgn function. This is presumably also true for 
perfect actions. The choice between the formula- 
tions is essentially just a trade-off between com- 
putational speed and the amount of chiral sym- 
metry breaking. 

9.5. Dynamical Overlap 

To date, most applications of on-shell chiral 
fermions have been for valence computations, but 
of course we need to introduce dynamical (sea) 
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overlap fermions too in order to be able to ob- 
tain reliable chiral results from full QCD. The 
first studies of dynamical overlap were carried out 
in the Schwinger model " 49150) several years ago. 
Large-scale dynamical domain-wall QCD compu- 
tations have been carried out by the RBC col- 
laboration < 51I52I53] . Recently Fodor, Katz, & 
Szabo 54 5^ and Cundy, Krieg, Frommer, Lip- 
pert, & Schilling [SHj have studied dynamical 
overlap fermions in QCD on "ridiculously small 
lattices." 

Apart from the issues in common with va- 
lence overlap computations that we have dis- 
cussed previously, HMC computations involving 
overlap fermions require the computation of the 
MD force. In particular, there is an issue con- 
cerning what to do when one of the eigenvalues 
of 75014^ passes through zero, which for continu- 
ous time MD evolution with an exact Neuberger 
operator would lead to a singular force. 

If a polynomial or rational approximation to 
the sgn function is used then this is readily differ- 
entiable and leads to a computable force term, so 
one method of handling the singular force is just 
to observe that there is no discontinuity in the 
approximations to the sgn function. This is effec- 
tively the method used in the dynamical domain 
wall computations, where the Higham approxi- 
mation is implicitly used and which, for prac- 
ticable values of n^, smoothes the discontinuity 
enough to keep the force under control. Of course, 
this is just a reflection of the fact that chiral sym- 
metry is not very well approximated. Another 
approach is to modify the leapfrog integrator 
to reflect or refract at a zero crossing in such a 
way that the MD process is still exactly reversible 
and area preserving. It will be interesting which, 
if any, of these approaches avoids the HMC accep- 
tance rate becoming small due to zero crossings. 

We also note the suggestion [S3 of working in 
the chiral sector with no zero modes, and ex- 
plicitly reweighting the configurations with m^^ 
where Q is the topological charge. 

10. Conclusions 

We are at an interesting point in the develop- 
ment of algorithms for dynamical fcrmion com- 



putations in QCD. On the one hand it is now 
generally accepted that quenching leads to sig- 
nificant systematic errors, and that to proceed 
towards reliable higher accuracy calculations we 
need to include dynamical fermions one way or 
another. On the other hand, there is much de- 
bate about whether the systematic errors of two 
or three flavour staggered quark simulations are 
under control. Recent progress with dynamical 
QCD^^' and GW fermions also show that the 
prospects for carrying out large-scale dynamical 
computations with light quarks, with all sources 
of systematic errors under control, and even with 
good chiral symmetry properties should be ex- 
pected with the current or next generation of 
computers. 
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